For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia
نویسندگان
چکیده
We report on work in progress on extracting lexical simplifications (e.g., “collaborate” → “work together”), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.
منابع مشابه
Learning a Lexical Simplifier Using Wikipedia
In this paper we introduce a new lexical simplification approach. We extract over 30K candidate lexical simplifications by identifying aligned words in a sentencealigned corpus of English Wikipedia with Simple English Wikipedia. To apply these rules, we learn a feature-based ranker using SVMrank trained on a set of labeled simplifications collected using Amazon’s Mechanical Turk. Using human si...
متن کاملSimpleScience: Lexical Simplification of Scientific Terminology
Lexical simplification of scientific terms represents a unique challenge due to the lack of a standard parallel corpora and fast rate at which vocabulary shift along with research. We introduce SimpleScience, a lexical simplification approach for scientific terminology. We use word embeddings to extract simplification rules from a parallel corpora containing scientific publications and Wikipedi...
متن کاملExtracting Lexical Reference Rules from Wikipedia
This paper describes the extraction from Wikipedia of lexical reference rules, identifying references to term meanings triggered by other terms. We present extraction methods geared to cover the broad range of the lexical reference relation and analyze them extensively. Most extraction methods yield high precision levels, and our rule-base is shown to perform better than other automatically con...
متن کاملExtraction and 3D Segmentation of Tumors-Based Unsupervised Clustering Techniques in Medical Images
Introduction The diagnosis and separation of cancerous tumors in medical images require accuracy, experience, and time, and it has always posed itself as a major challenge to the radiologists and physicians. Materials and Methods We Received 290 medical images composed of 120 mammographic images, LJPEG format, scanned in gray-scale with 50 microns size, 110 MRI images including of T1-Wighted, T...
متن کاملUnsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web
This paper presents an unsupervised relation extraction method for discovering and enhancing relations in which a specified concept in Wikipedia participates. Using respective characteristics of Wikipedia articles and Web corpus, we develop a clustering approach based on combinations of patterns: dependency patterns from dependency analysis of texts in Wikipedia, and surface patterns generated ...
متن کامل